A Very Fast and Momentum-Conserving Tree Code 



Walter Dehnen 

Max-Planck-Institut fur Astronomie, Konigstuhl 11, D- 69 117 Heidelberg, Germany 

dehnen@mp i a-hd . mpg . de 



ABSTRACT 

The tree code for the approximate evaluation of gravitational forces is extended and substantially ac- 
celerated by including mutual cell-cell interactions. These are computed by a Taylor series in Cartesian 
coordinates and in a completely symmetric fashion, such that Newton's third law is satisfied by con- 
struction and hence momentum exactly conserved. The computational effort is further reduced by 
exploiting the mutual symmetry of the interactions. For typical astrophysical problems with iV=10 5 
and at the same level of accuracy, the new code is about four times faster than the tree code. For large 
N, the computational costs are found to scale almost linearly with N, which can also be supported by 
a theoretical argument, and the advantage over the tree code increases with ever larger TV. 

Subject headings: methods: n-body simulations - methods: numerical - stellar dynamics 



1. Introduction 

The tree code (cf. Barnes & Hut 1986, hereafter B&H) 
has become an invaluable tool for the approximate but 
fast computation of the forces in studies of collisionless 
gravitational dynamics. It has been applied to a large 
variety of astrophysical problems. The gravitational po- 
tential generated by N bodies of masses fi n and at po- 
sitions X n is 

N 

$(X) = -J2 »n g(\x - x n \), (l) 

n=l 

where g(r) denotes the greens function, i.e. for un- 
softened gravity g(r) = G/r. The essence of the tree 
code is to approximate this sum over N terms by re- 
placing any partial sum over all bodies within a single 
cell which is well-separated from X by just one term. 
The inner structure of the cell is partly taken into ac- 
count using its multipole moments. This method re- 
duces the overall costs for the computation of all forces 
from 0(N 2 ) to O(NlogN). 

The tree code, however, does not exploit the fact that 
the force due to the contents of some cell is very similar 
at nearby positions (even though one may use the fact 
that nearby bodies tend to have very similar interaction 
lists, cf. Barnes 1990). Exploiting this is the idea of the 
fast multipole method (FMM) (Greengard & Rokhlin 



1987). The FMM employs a (usually) non-adaptive 
structure of hierarchical grids and considers only inter- 
actions between nodes on the same grid level according 
to their geometrical neighbourhood. The gravitational 
field due to some source cell and within some sink cell 
is approximated by a multipole expansion in spherical 
harmonics, the order of which is adapted to meet prede- 
fined accuracy limits. This method has been claimed to 
reduce the overall amount of operations to O(N), but 
the tables given by Cheng, Greengard & Rokhlin (1999) 
do not support this claim. Capuzzo-Dolcetta & Miocchi 
(1998) find that the FMM needs 0(N log TV) operations, 
and is significantly slower for astrophysical applications 
than the tree code at comparable accuracy. 

Instead of using a spherical multipole expansion of 
adaptive order, it is actually more efficient to use a 
Cartesian expansion of fixed order. Moreover, by pre- 
serving the symmetry of the gravitational interaction 
for mutual cell-cell interactions, one can (i) reduce the 
computational effort and (ii) obtain a code that satisfies 
Newton's third law by construction and hence results in 
exact conservation of momentum, a property not shared 
by the traditional tree code. 

2. Description of the Code 

We start as the B&H tree code with a hierarchical tree 
of cubic cells. Each cell has up to eight sub-nodes cor- 
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Fig. 1. — Two well-separated cells. The solid and dotted circles 
have radii r max and r max /0, respectively. 



responding to its octants. A node can be cither a single 
body or another cell. The trcc-building phase (cf. B&H) 
also includes the computation of the cells' masses, cen- 
ters Z of mass, and quadrupole moments. 

2.1. The Opening Criterion 

In order to benefit from the symmetry of the gravita- 
tional interaction, the opening criterion, which decides 
whether or not two nodes are well-separated so that a 
direct mutual interaction is acceptable, must be sym- 
metric, too. We employ an extension of the criterion 
used in the tree-code: nodes A and B are well-separated 
if 

\Z A -Z B \>( ' max A )/0, (2) 

where the opening angle 9 controls the accuracy of the 
code. r max is the radius of a sphere centered on the 
node's center of mass and encircling all bodies within 
it. Bodies naturally have r max = 0, i.e. two bodies are 
always well separated, while for the interaction between 
a body and a cell the criterion (2) reduces to that used in 
the tree code. Note that if one additionally to equation 
(2) requires r max A = 0, the standard tree code is re- 
covered, but the symmetry between A and B is broken. 

There exist two upper limits for the radius r max . One 
is the distance 6 max between the cell's center of mass, 
Z, and its most distant corner (Salmon & Warren 1994). 
The other is 



max {r 

11 

sub— nodes i 



+ \Zi - Z\} 



(3) 



(Benz et al. 1990). After computation of both these 
upper limits, we take the smaller one to be r max . For 
cells with only a few bodies like cell A in Fig. 1, the 
latter often gives values significantly smaller than 6 max , 
while for cells with many bodies, like cell B in Fig. 1, 
&max is the tighter limit. 



2.2. Approximating Gravity 

Consider two bodies at X and Y which reside in two 
well-separated cells A and B with centers of mass at, 
respectively, Za and Zb and separation R = Za — Zb ■ 
We may re- write X — Y = R+(x — y) with x = X — Z A 
and y = Y — Zb being small in magnitude compared to 
R (because the cells are well-separated, cf. Fig. 1). The 
Taylor expansion of g(\X — Y\) around R reads 



g(\X-Y\)= 1 £± i [[(x-v)-V\ p g(\r\ 



r=R 



(4) 



Separating powers of x from powers of y in equation (4) 
and subsequently taking the mass weighted sum over cell 
B yields a Cartesian multipole expansion of the potential 
^b->a(-^0 at any position X within cell A and due to 
all bodies inside cell B (Warren & Salmon 1995). Since 
Zb was chosen to be the center of mass of cell B, its 
dipolc vanishes. The highest-order multipole occurring 
in such an expansion may actually be omitted, since 
it only contributes a constant to the approximation for 
g and does not affect the approximation for V<? and 
hence the force. The expression of third order thus reads 
(without the octopole; using Einstein's sum convention) 



,(2) 



$b^a(X) « M B I Z><°> + iQsiM. 

2b j, 



T> {1) + in„ ,-d {3) 



(5) 
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where Mb and Qb are the mass and specific quadrupole 
moment 
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(7) 



^ijk = ($ijRk + fijkR-i + <>kiRj) D + RiRjRk D 



with 



(8) 



r=\R\ 



The symmetry between x and y at every order of the 
Taylor expansion in equation (4) has two important con- 
sequences. First, if this expansion is used to compute 
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Fig. 2. — CPU time consumption plotted versus the mean and 99% relative force error for the three test cases. The dots correspond, 
from left to right, to = 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1. 



both $ b ^a(-X') and ^a—b^), Newton's third law is 
satisfied by construction. Note, that our omission of 
the octopole term broke the symmetry only in the ze- 
roth order and has no effect on the forces. 

Second, the expressions for Ma$b^a and Mb$a^b 
are very similar: the expansion coefficients of second and 
third order differ only by mere signs, such that comput- 
ing these coefficients for both Taylor series at one time 
is substantially faster than computing them at different 
times. 

The transformation, or shifting, of the expansion cen- 
ter to some other position is trivial compared to the 
analogous procedure in the FMM (see Cheng et al.). 

2.3. The Algorithm 

The standard tree code computes the forces on each 
body by a recursive tree walk, which visits each node 
exactly once as a gravity sink, and thus exhibits an in- 
herent asymmetry between sources and sinks. The new 
algorithm avoids this asymmetry. 

First, in the interaction phase, the Taylor series co- 
efficients are evaluated and accumulated in data fields 
associated with each node. This phase is based on the 
concept of mutual interactions (Mis), pairs of nodes, A 
and B, such that bodies in node A must receive forces 
from all bodies in node B, and vice versa. We start by 
the MI describing the root-root self-interaction, and pro- 
cess a given MI as follows. (I) A body self-interaction 
is ignored; (2) a cell self-interaction is split into the Mis 
between the sub-nodes 1 , and the process is continued 
on each of the new Mis; (3) a MI representing a well- 
separated pair of nodes is executed: the Taylor coef- 

1 ln a cubic oct-tree, these are at most 36 independent sub-Mis. 



ficients are computed and added to the nodes' corre- 
sponding data fields; (4) finally, in any other case, the 
node with larger r max is split, and up to eight new Mis 
are created and processed. 

Secondly, in the collection phase, the Taylor coeffi- 
cients are passed down the tree: the expansion center 
is shifted to the center of mass of the currently active 
cell and the coefficients are accumulated. The Taylor 
expansion is evaluated at the position of any body and 
the values for potential and acceleration arc added to 
its data fields (which may already contain contributions 
accumulated during the interaction phase). 

3. Performance Tests 

We tested the new algorithm and compared it with the 
tree code in three typical astrophysical situations: (I) a 
spherical Plummcr model, representing a rather homo- 
geneous stellar system, (2) a spherical Hernquist-modcl 
(1990) galaxy, and (3) a group of five such galaxies with 
various masses and scale radii. We generated 10 5 ran- 
dom initial positions from each of these cases, truncat- 
ing the density at 1000 scale radii, and evaluated the 
exact mutual forces at all positions and the approxi- 
mated forces due to the tree code (up to quadrupole 
order) and the new code for opening angles 6 between 
0.2 and 1. We used an optimally chosen softening with 
the biweight softening kernel (see Dehnen 2000), but the 
results are insensitive to these settings. Both approxi- 
mate methods have been coded by the author 2 and use 

2 At the same 9, the tree code is twice as fast as a code publicly 
available from J. Barnes, mainly because the new opening criterion 
leads to fewer interactions. However, even at the same number of 
interactions, the author's code was about 30% faster. 
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the same opening criterion (§2.1). In order to measure 
the accuracy of the approximated forces, we evaluated 
(cf. Capuzzo-Dolcetta & Miocchi 1998) 



K - a n P \/ a n P , 



(9) 



where a n denotes the magnitude of the acceleration of 
the nth body due to either of the approximate methods 
and a pp that of the exact computation. 

Figure 2 plots the CPU time needed for the force 
approximation on a Pentium III/500Mhz PC versus the 
mean relative error and that at the 99 percentile. For the 
new code, the time consumption scales almost inversely 
with the error, while the tree code flattens off 3 at 9 > 
0.7. Evidently, at an acceptable level of accuracy, e.g. 
£ 99% — 0.01, the new code is about four times faster 
than the tree code, even though it requires a smaller 
value of 9 (0.5 as compared to 0.7 for the tree code). 

For the test case of the Hcrnquist model, Figure 3 
plots the CPU time consumption per body versus N 
at fixed 9 = 0.7 for the tree code and 0.5 for the new 
code. The tree code shows the well-known log N scaling, 
while for N > 10 5 the new code requires only a constant 
amount of CPU time per body. This can be explained 
as follows. Arranging eight root cells to a new root box, 
increases N to 8N and the number Nj of interactions 
to 8(Ni + N + ), where 8-/V + interactions are needed to 
compute the forces between the former root cells. Thus, 



diV/ _ AT, AlogTVj _ TV/ + 7V+/ln8 
A/V ~ ~N A log A" ~~ N ' 



(10) 



In the tree code, N + oc TV yielding Nj oc N log N. In the 
new code, N+ may be estimated to contain two contri- 
butions, a constant term accounts for the interactions 
with distant nodes, and a term oc TV 2 / 3 for those on 
the surface of the former root cells. Inserting this into 
equation (10) yields 



TV/ oc N - Cl N 2/3 - c 2 



(11) 



with constants c\ and c 2 that depend on 9. Thus, at 
large TV, a linear relation is approached. Note that this 
argument differs from that given for the FMM by Green- 
gard & Rokhlin (1987), who assumed that the resolution 
may remain fixed when increasing N. 



3 This is, because of r max is not proportional to the cell size, so 
that increasing 9 from 0.7 does not much decrease the number of 
interactions. 
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Fig. 3. — CPU time per body plotted versus N for test case 2 
(Hernquist model) 



4. Discussion 

A new code for the approximate evaluation of gravita- 
tional forces has been presented, tested, and compared 
to the tree code. This new code is substantially faster 
than the tree code. Moreover, unlike the latter, it satis- 
fies Newton's third law by construction, such that any 
A^-body code based on it will not introduce spurious net- 
accelerations. The new code is based on a Taylor expan- 
sion of the greens function in Cartesian coordinates and 
incorporates mutual cell-cell interactions. The simple 
algorithm is well suited for implementation on parallel 
computers: different mutual interactions (Mis) can be 
passed to different CPUs. 

The scaling of the CPU time required for the mutual 
forces of a number N of bodies becomes essentially lin- 
ear at N > 10 5 , so that with ever larger A^ the new 
code is increasingly faster than the tree code, allowing 
for a substantial improvement in simulations employing 
large number of bodies. The only disadvantage is the 
increased requirement of memory compared to the stan- 
dard tree code: 20 floating point numbers per cell are 
needed to hold the Taylor expansion coefficients. (By 
using a tree-walking algorithm instead of that given in 
§2.3, one can avoid this at the price of enhanced CPU 
time consumption.) 

In spirit, the new code is similar to Grecngard & 
Rokhlin's (1987) fast multipolc method, but is more effi- 
cient because it (i) uses a Cartesian instead of a spherical 
harmonic multipole expansion and (ii) fixes the order of 
the expansion while controlling the accuracy via the in- 
teraction condition, rather than fixing the interactions 
and adapting the expansion order to the accuracy. 
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A concern with codes based on cell-cell interactions 
is their performance in the presence of individual time 
steps. Clearly, when not all the forces are to be com- 
puted, such codes fare less favorably. However, when the 
forces for all bodies within some domain are desired, the 
new code is still a significant improvement over the tree 
code. 

The new code has been written in C++ and will be 
electronically available from the author upon request. 

The author thanks Tom Quinn for valuable discus- 
sions and Joshua Barnes, Lars Hernquist, Junichiro 
Makino, Andy Nelson, and Thorsten Naab for useful 
comments. 
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